Goto

Collaborating Authors

 agent state




Periodic agent-state based Q-learning for POMDPs

Neural Information Processing Systems

The standard approach for Partially Observable Markov Decision Processes (POMDPs) is to convert them to a fully observed belief-state MDP. However, the belief state depends on the system model and is therefore not viable in reinforcement learning (RL) settings. A widely used alternative is to use an agent state, which is a model-free, recursively updateable function of the observation history. Examples include frame stacking and recurrent neural networks. Since the agent state is model-free, it is used to adapt standard RL algorithms to POMDPs. However, standard RL algorithms like Q-learning learn a stationary policy.


Periodic agent-state based Q-learning for POMDPs

Neural Information Processing Systems

The standard approach for Partially Observable Markov Decision Processes (POMDPs) is to convert them to a fully observed belief-state MDP . However, the belief state depends on the system model and is therefore not viable in reinforcement learning (RL) settings. A widely used alternative is to use an agent state, which is a model-free, recursively updateable function of the observation history. Examples include frame stacking and recurrent neural networks. Since the agent state is model-free, it is used to adapt standard RL algorithms to POMDPs. However, standard RL algorithms like Q-learning learn a stationary policy.




Convergence of regularized agent-state-based Q-learning in POMDPs

Sinha, Amit, Geist, Matthieu, Mahajan, Aditya

arXiv.org Artificial Intelligence

In this paper, we present a framework to understand the convergence of commonly used Q-learning reinforcement learning algorithms in practice. Two salient features of such algorithms are: (i)~the Q-table is recursively updated using an agent state (such as the state of a recurrent neural network) which is not a belief state or an information state and (ii)~policy regularization is often used to encourage exploration and stabilize the learning algorithm. We investigate the simplest form of such Q-learning algorithms which we call regularized agent-state-based Q-learning (RASQL) and show that it converges under mild technical conditions to the fixed point of an appropriately defined regularized MDP, which depends on the stationary distribution induced by the behavioral policy. We also show that a similar analysis continues to work for a variant of RASQL that learns periodic policies. We present numerical examples to illustrate that the empirical convergence behavior matches with the proposed theoretical limit.


Allen: Rethinking MAS Design through Step-Level Policy Autonomy

Zhou, Qiangong, Wang, Zhiting, Yao, Mingyou, Liu, Zongyang

arXiv.org Artificial Intelligence

We introduce a new Multi-Agent System (MAS) - Allen, designed to address two core challenges in current MAS design: (1) improve system's policy autonomy, empowering agents to dynamically adapt their behavioral strategies, and (2) achieving the trade-off between collaborative efficiency, task supervision, and human oversight in complex network topologies. Our core insight is to redefine the basic execution unit in the MAS, allowing agents to autonomously form different patterns by combining these units. We have constructed a four-tier state architecture (Task, Stage, Agent, Step) to constrain system behavior from both task-oriented and execution-oriented perspectives. This achieves a unification of topological optimization and controllable progress. Allen grants unprecedented Policy Autonomy, while making a trade-off for the controllability of the collaborative structure. The project code has been open source at: https://github.com/motern88/Allen


InfGen: Scenario Generation as Next Token Group Prediction

Peng, Zhenghao, Liu, Yuxin, Zhou, Bolei

arXiv.org Artificial Intelligence

Realistic and interactive traffic simulation is essential for training and evaluating autonomous driving systems. However, most existing data-driven simulation methods rely on static initialization or log-replay data, limiting their ability to model dynamic, long-horizon scenarios with evolving agent populations. We propose InfGen, a scenario generation framework that outputs agent states and trajectories in an autoregressive manner. InfGen represents the entire scene as a sequence of tokens, including traffic light signals, agent states, and motion vectors, and uses a transformer model to simulate traffic over time. This design enables InfGen to continuously insert new agents into traffic, supporting infinite scene generation. Experiments demonstrate that InfGen produces realistic, diverse, and adaptive traffic behaviors. Furthermore, reinforcement learning policies trained in InfGen-generated scenarios achieve superior robustness and generalization, validating its utility as a high-fidelity simulation environment for autonomous driving. More information is available at https://metadriverse.github.io/infgen/.


Periodic agent-state based Q-learning for POMDPs

Neural Information Processing Systems

The standard approach for Partially Observable Markov Decision Processes (POMDPs) is to convert them to a fully observed belief-state MDP. However, the belief state depends on the system model and is therefore not viable in reinforcement learning (RL) settings. A widely used alternative is to use an agent state, which is a model-free, recursively updateable function of the observation history. Examples include frame stacking and recurrent neural networks. Since the agent state is model-free, it is used to adapt standard RL algorithms to POMDPs. However, standard RL algorithms like Q-learning learn a stationary policy.